Hybrid Optimized GRU-ECNN Models for Gait Recognition with Wearable IOT Devices

With the advent of the Internet of Things (IoT)


Introduction
In recent years, activity recognition (AR) has witnessed exponential growth in in different domains such as healthcare [1], home automation [2], and even criminal activity detection. ese methods are adopted aiming both at improving the quality of living and allowing people to stay without any support from others [3]. In the health care system, these AR systems are burgeoning technology mainly designed to detect the patient's mobility in rehabilitation therapy and to monitor physical performance after undergoing treatment with great expectations of improving his/her living quality as much as possible.
However, activity data remain more complex, which paid the way for the open research to design the intelligent human activity recognition system. Initially, simple binary sensors are used to design the recognition system [4,5]. More recently, the Internet of ings (IoT) has been used to collect and analyze human activities and gestures [6,7]. ese devices are used as wearable devices that can be continuously used indoors or outdoors while ensuring the privacy and security of the data.
Owing to their pervasiveness and embedded sensor diversity, wearable IoT devices have been commonly used to develop AR recognition systems [8][9][10]. In this development, wearable IoT devices have the capability to capture and process activities and behaviors that are termed as gait signals. Accelerometers and gyroscopes are considered to be the most frequently used sensors equipped in WIOT devices to capture and transmit the gait sequences that can be used for further monitoring. erefore, these devices have allowed for the extraction of diverse gait information from the person's movement that can be used to recognize physical activities related to health care applications.
Hence, the WIOT devices are considered as most important data capturing unit in AR systems. e collected data are then used to build the effective recognition systems. Magnificent development in AR systems is done by using the conventional machine learning algorithms such as Decision Trees [11][12][13], the Hidden Markov models [14][15][16], and support vector machines (SVMs) [17][18][19] have been deployed to achieve the higher rate of recognition. Since these methods are trapped in lower-dimensional data space, handling the larger data require the more efficient learning models to achieve higher performance.
Recently, studies are migrating towards deep learning algorithms to handle the larger amounts of data in an effective manner. Deep learning algorithms such as convolutional neural networks (CNNs) [20,21] and recurrent neural networks (RNNs) [22,23] play an undisputed role for developing AR systems. Additionally, the hybrid deep learning methods [24][25][26] are also gaining the brighter light of research in designing AR systems, but these collected gait data need transformation to influence the deep learning algorithms to obtain better classification with reduced computational cost. Hence, the hybrid combination of algorithms is required mandatorily to perform the data transformation and achieve high performance with low complexity.
In this context, this paper proposes a new hybrid algorithm, which ensembles the CNN layers with gated recurrent units and BAT-inspired classification networks. e user-defined CNN is used to extract the spatial features, whereas GRU is used to extract the temporal features. ese features are then fed to complexity-aware BAT-inspired classification networks to achieve a better classification of AR with low complexity overhead.

Contribution
(1) is paper focuses on the development of novel testbeds based on wearable IoT devices for the effective collection of raw gait data. (2) is paper also proposes a methodology for restructuring the raw data suitable to train the deep learning algorithms for better performance. (3) is paper proposes a hybrid deep learning algorithm for effective feature extraction with less computational cost and a high gait recognition rate.
(4) Finally, the paper presents the excellence of the proposed methodology by conducting experiments using other benchmark datasets and comparing the performance with other existing deep learning-based AR systems. e rest of the paper is organized as follows: Section 2 presents the related works proposed by more than one authors. e data collection unit, data preprocessing, and the proposed hybrid model are presented in Section 3. e dataset descriptions, experimentations, results, findings, and analysis are presented in Section 4. Finally, the paper is concluded in Section 5 with future enhancements.

Related Works
Abdullah et al. adopted a neural network for diagnosing the human abnormalities using their walking styles, which are detected at lower limbs. ese real-time samples are extracted through the Levenberg-Marquardt method, and their artifacts are removed using the Butterworth filters in order to train the neural network effectively. e gait data are observed from 5-subjects at distinct speeds 2.4, 3.2, and 5.4 kmph and in total 45 instances are utilized for evaluation [27]. ough the proposed NN achieved better accuracy for tested data, it is not suitable for dynamic movements, and the tested data range are very low.
On the HuGaDB dataset, Saleh et al. used the three supervised machine learning models for human activity recognition: random forest, Navie Bayes, and IB1 classifiers.
is HuGaDB contains data on standing, sitting, running, and walking, which is monitored using accelerometers and gyroscopes. Random forest outperformed the other two learning models in terms of classification accuracy while requiring less setup time [28]. Moon et al. introduced a multimodel gait identification classifier based on the convolutional and recurrent neural networks combined with a support vector feature extractor [29].
Jiang and Yin used the Short-time Discrete Fourier Transform (STDFT) to create a time-frequency-spectral image from time-serial signals in [30]. After that, CNN is used to process the image in order to recognize basic daily movements such as walking and standing. Using a mix of time-frequency-spectral characteristics and CNNs, Laput and Harrison [31] built a fine-grained hand activity sensing system. ey were able to classify 25 atomic hand activities performed by 12 participants with a 95.2 percent accuracy. e spectral properties can be employed not only for wearable sensor activity recognition but also for activity recognition without the usage of a device. For learning modality-specific temporal properties, Ha and Choi [32] proposed a new CNN structure with distinct 1D CNNs for different modalities. Other types of CNN variants are being studied as part of the development of CNNs for efficiently integrating temporal characteristics.
Shen et al. [33] used the gated CNN to recognize everyday activities from audio signals and found it to be more accurate than the naïve CNN [35] built an RNN using gated recurrent units (GRUs) and used it to activity recognition. However, some research has found that different types of RNN cells do not perform significantly better than the traditional LSTM cell in terms of classification accuracy [36]. Wang et al. [37] used the CNN and an LSTM to create a classifier that could automatically extract difficult characteristics from sound data and recognize gestures. For different scales of local temporal feature extraction, Xu et al. [38] used the sophisticated Inception CNN structure, whereas GRUs were used for efficient global temporal representations.
To assess more complex temporal hierarchies, Yuta et al. [39] used a dual-stream ConvLSTM network, with one stream handling shorter time lengths and the other longer time lengths. Guo et al. [40] proposed that MLPs be used to generate a base classifier for each sensory modality, and that ensemble weights be assigned at the classifier level to incorporate all classifiers. e authors not only evaluated recognition accuracy while creating the basis classifiers but also stressed variety by inducing diversity metrics. As a result, the diversity of different modalities is retained, which is important for overcoming over-fitting difficulties and enhancing overall generalization capacity.

System Overview.
e proposed framework has four main phases, namely: (i) Data collection unit; (ii) Data preprocessing and filtering; (iii) Spatial and Temporal feature extraction using the proposed architecture; (iv) Classification phase. e block diagram of the proposed framework is shown in Figure 1.

Data Collection Unit.
To collect the experimental data, 29 volunteers with body weights ranging from 25 kg to 64 kg were selected. e participants were all healthy without any neurological disorders and had no physical injuries to their legs or feet, which may have affect the walking gait phase detection. With the advancement of Internet of ings (IoT) devices, this work used six battery-powered IoT devices to collect the corresponding inertial information. Figure shows the placement of the six IoT devices on the participants. To collect the inertial data from the lower limbs, MICOTT boards are used as the main IoT devices, which consist of 8-BIT NODEMCU as the main CPU interfaced with the 10-BIT SPI (Serial Peripheral Interface) based MCP3008 analog channels and ESP8266 WIFI transceivers. ADXL435 threeaxis accelerometers and BMG250 three-axis gyroscopes are interfaced with MICOTT boards to collect inertial information from both limbs of participants. Micropython programming was deployed in the board to collect data and transmit them to the cloud. e series of Li-On batteries with operating voltage of 3.3 V is used to power up the board and can be replaced as the batter drains its total power.
During the experimentation, all participants were required to walk normally on the treadmill at different speed ranging from 0.66 m/s to 1.3 m/s for at least 180 s. All the participants were requested to walk normally for 2 minutes at each speed. e experimental data were collected for every 3 minutes and the data collected were transmitted to the cloud for further processing. Besides, to evaluate the excellence of the proposed algorithm, we have used other public benchmark datasets such as the whuGait and OU-ISIR datasets, and details of datasets are discussed in Section 4. Figure 2 presents the data collection procedure used in the proposed methodology.

Data Preprocessing Process.
e stored data sample in the cloud contain multiple features from the six IoT devices, and each data includes acceleration and angular velocity data in the X, Y and Z directions. e sequences of the data sample are denoted by the following equation: where y is the total data sample, s1, s2, and s3 are accelerometer data, and f1, f2, and f3 are angular velocity data, which are stored in cloud. As mentioned in the above equation, combined data are stored in the cloud, which need the segmentation and extraction that can be used for the e data collected in the cloud are downloaded offline and data preprocessing steps are used for effective data separation and extraction. To achieve less computational complexity with high accuracy of segmentation, this paper uses the novel Pearson correlation sliding window technique [41], which combines the Pearson correlation coefficient [42] and Sliding Window techniques. e value of P plays an important role in the data extraction, in which different thresholds are used for effective data extraction over a period of time. Figure 3 presents the preprocessed data after applying the proposed technique.

Proposed the Hybrid Deep Learning Model.
As the analysis of the walking ability of the individual models with the fused features, we find that the integration of the different learning models can lead to better gait signal recognition and classification with less complexity. Hence, we intend to design the hybrid ensemble of the deep and machine learning models to learn the combined spatiotemporal feature effectively, which tends to the way of high accuracy and less computational complexity. e complete architecture of the proposed hybrid model is shown in Figure 4.

CNN-Based Spatial Feature Extraction.
is paper uses the CNN layers are core spatial feature extractors, which can act as the input to the dense learning layers, which are based on the optimized extreme learning machines. First, we briefly explain the concept of CNN architectures, which act as the main spatial feature extractor. e convolutional neural network (CNN) is a biologically propelled advancement of the multilayer perceptron (MLP).
As shown in Figure 5, CNN by connecting various convolution layers and max-pooling tasks. Information is handled through these profound layers to deliver the element maps, which are at last changed into an element vector by going through an MLP.
is is alluded to as a fullyconnected layer (FC) that performs classification and detection. For an effective spatial feature extractor, this paper uses six-convolutional layers in which the preprocessed collected data are given as the inputs. e CNN layers used in this paper are presented in Table 1.
e ReLU function is used as activation function in the network. To reduce the risk of the gradient vanishing problem, we used the batch-normalization process right after the fourth and fifth convolutional layers. e convolutional feature maps for the input x are denoted by using the following equation: where W1 is weight matrix of the layers, b1 is networks' bias weights, and β(Relu) is ReLU activation function. We train the network by initializing the weights randomly with a learning rate of 0.01 and momentum of 0.9.

GRU-Based Temporal Feature
Extraction. e most important structure used for the temporal feature extraction is the GRU module, which receives the data collected from the IoT-cloud systems. Figure 6 shows the structure of the GRU network used in the paper. e GRU network consists of two gates and is considered faster than the LSTM and RNN models [43]. Where x t is the input feature at the current state, y t is the output state, h t is the output of the module at the current instant, Z t and r t are update and reset gates, W(t) is weights, and B(t) is bias weights at current instant. e mathematical expression for extracting the feature maps is given in the following equation:   Computational Intelligence and Neuroscience 3.2.6. Classification Layers. Next, we further propose an optimized single feed forward network, which uses the principle of extreme learning machine to train the spatiotemporal features obtained from the previous layers. In order to have less computational complexity, this research work uses the extreme learning network with auto-tuning property whose optimization is done by the BAT-inspired principles. e detailed description of the proposed classification layer is given as follows: (1) ELM Decision and Classification layer: ELM is a kind of neural network that utilizes single hidden layers and works on the principle of auto-tuning property. ELM exhibits better performance, high speed, and less computational overhead when compared with the other learning models such as support vector machines (SVM), bayesian classifier (BC), K-nearest neighborhood (KNN), and even Random Forest (RF).
is kind of neural network utilizes the single hidden layers, in which the hidden layers do not require the tuning mandatorily. Compared with the other learning algorithms such as support vector machines (SVM) and Random Forest (RF), ELM exhibits better performance, high speed, and less computational overhead. ELM uses the kernel function to yield good accuracy for better performance. e major advantages of the ELM are minimal training error and better approximation. Since ELM uses auto-tuning of the weight biases and nonzero activation functions. e detailed working mechanism of the ELM is discussed in [44]. e input features maps of the ELM are denoted by the following equation: where X is the fused spatio-temporal features obtained from the CNN and GRU layers, F is the CNN's spatial feature and P is the GRU temporal feature. e output ELM function is denoted by the following equation: e overall training of ELM is given by the following equation: where X(n) is input fused feature maps, β is temporal matrix, which is solved by the Moore−Penrose generalized inverse theorem, denoted by X T , C is constant, and B and W are Max-pooling layers-1 2 × 2 3 Conv(2d) -Layer-2 2 3 × 3 4 Max-pooling layers-2 2 × 2 5 Conv(2d) -Layer-3 2 2 × 2 6 Max-pooling layers-3 1 × 1 7 Conv(2d) -Layer-4 2 2 × 2 8 Max-pooling layers-4 1 × 1 9 Conv(2d) -Layer-5 2 2 × 2 10 Max-pooling layers-5 1 × 1 11 Conv(2d) -Layer-6 2 1 × 1 12 Max-pooling layers- weights and bias factors of the network with the sigmoidal activation function. e proposed network is trained with these features using the sigmoidal activation function. To resolve the computational problems, this paper adds the BAT-inspired optimizers to tune the hyper-parameters of the proposed ELM classifiers. e working mechanism of the BAT-inspired ELM is discussed as follows.
(2) BAT Inspired ELM Layers: this section describes the working mechanism of the BAT algorithm over ELM layers to provide better classification.
(3) Bat Algorithm-an Overview: the standard mega-bat calculation depended on the echolocation or bio-sonar attributes of microbats. In light of the echo cancelation calculations, Yang [45] (2010) built up the bat calculation with the accompanying three glorified guidelines: (1) All bats use echolocation to detect separation, and they likewise "know" the distinction between sustenance/prey and foundation obstructions in some mystical manner. Each bat Motion is associated with the velocity v it and initial distance x it with the "n" number of iterations in a dimensional space or search space. Among all the bats, the best bat has to be chosen depends on the three rules, which are stated above. e updated velocity vit and initial distance x it using the three rules are given below in the following equation: where β € (0,1) fmin is the minimum frequency � 0 and fmax is the maximum frequency, which initially depends on the problem statement. Each bat is initially allocated for the frequency between the fmin and fmax. Consequently, bat calculations can be considered as a frequency tuning calculation to give a reasonable blend of investigation and exploitation. e emission rates and loudness basically give mechanism to programmed control and auto-zooming into the district with promising solutions.
To get a better solution, it is fundamental for the variety of the loudness and the pulse emission. Since the loudness normally diminishes once a bat has discovered its prey, while the rate of pulse emission expands, the loudness can be picked as any estimation of accommodation, between Amin and Amax, accepting Amin � 0 implies that a bat has quite recently discovered the prey and briefly quit transmitting any stable.

Advantages of Bat Algorithms.
e major advantages of BAT algorithms are as follows: (1) High Efficiency than PSO, GA, and other heuristic algorithms [46] (2) Faster and more versatile search space than SGD [47] Motivated by the advantages of the BAT algorithm, we proposed the new hybrid integration of the BAT algorithm and the ELM training network for better gait classification.

BAT-Inspired ELM Layers.
As discussed in Section 3.2.4, the simple bat algorithms are used to optimize the weights of ELM networks. In this case, bat's prey searching mechanism is used as the main term to optimize the weights and hidden layers of ELM. Initially, these hyper parameters are selected randomly and passed to the ELM training network. e fitness function of the proposed network is given by equation (9). For each iteration, hyper parameters are calculated by using equations (7) and (8). e iteration stops when the fitness function matches equation (9).
Once the inputs weights are optimized by the BAT algorithm, the proposed classification layer classifies the gait activities with high speed and less computation. e working mechanism of the proposed classification layers is presented in Algorithm 1. e training network uses 30 epochs, batch size of 40 with 150 hidden layers and 0.001 learning rate. Table 2 presents the experimental parameters used for training the proposed network. Furthermore, we have calculated the performance metrics such as accuracy, precision, recall, specificity, and F1-score using different datasets. Additionally, we have calculated the AUC (Area under ROC) and confusion matrix to prove the superiority of the proposed model. e mathematical expression used for calculating the performance metrics is presented in Table 3. Higher scores of the metrics indicate better performances. To solve the network's overfitting problem and improve the generalization problem, the early stopping method [48] is used in the paper.

Experimentation and Evaluation Metrics.
is method can be used to end the proposed network training when the validation performance shows no improvement for N consecutive times. e Computational Intelligence and Neuroscience complete model was developed using open source Ten-sorFlow version 2.1.0 with Keras as backend and implemented on a PC workstation with Intel Xeon CPU, NVIDIA Titan GPU, 16 GB RAM, and 3.5 GH Z operating frequency.

Performance Evaluation of the Proposed Model Using the Different Datasets.
In this part, we conducted experiments using real-time and benchmark datasets. We have calculated ROC and confusion matrix of the proposed network model using different datasets. Figure 7 shows the ROC curves of the proposed model using different gait datasets. It is obvious that the proposed model has shown the 0.9880 AUC for raw data collected, 0.980 AUC for whuGait, and 0.9780 AUC for OU-ISIR datasets. e proposed network has shown constant performance for real-time datasets and public datasets also. Figure 8-shows the confusion matrix of the proposed model using different dataset. Figure 8 shows the confusion matrix of the proposed model under datasets. It is evident that from  Table 4. From Table 4, it is found that the proposed network has exhibited higher performance using real-time datasets and whuGait datasets. It is also found that the proposed model has shown slight edge of peak performance when handling the OU-ISIR datasets.

Comparative Analysis of the Proposed Model with the Other Existing Models.
To prove the superiority of the algorithm, performance of the proposed model is calculated and evaluated against the existing the hybrid deep learning algorithms such as TL-LSTM [49], 2D-CNN-LSTM [50], DCLSTM [51], Q-BTDNN [52], ATTENTION + CNN [53], CNN + GRU [54], and CNN + SVM [55]. Assign the bias weights and input layers by (6) and (7)  (10) Calculate the fitness function using equation (9)  (11) If (Fitness function � � Maximum Accuracy) (12) Go to Step 17 (13) Else (14) Go to Step 8 (15) End (16)         e proposed model has shown greater performance than the other existing learning models. e performance of the different learning models using OU-ISIR datasets are shown in Figures 9-14. From Figures 9-14, it is clear that the inclusion of the BAT-inspired ELM models along with spatio-temporal feature extraction has shown its excellence over the other learning models. From the above experiments, it is clear that the proposed model has shown the better AR rate even with multiple datasets.

Computational Complexity.
e computational complexity of the proposed technique is represented by big-o-    Computational Intelligence and Neuroscience 13 notations. e different CNN algorithms used for evaluation and complexity analysis are presented in Table 5. From Table 5, it is found that BAT optimized classification layer has produced less computational complexity, which is even 10% lesser than the other existing algorithms.

Conclusion and Future Scope.
In this paper, a novel GRU fused CNN feature extractor with the BAT-inspired classification layer is formed for better recognition of human gaits that can be used for health care applications. e realtime datasets were collected using the wearable IoT(W-IoT) devices and stored in the cloud for further monitoring and processing. For an efficient classification, these data were restructured using the Pearson correlated sliding windowing method. en, these restructured data are fed into the two layers of the deep learning model one is user-defined CNN, which is used to extract the spatial features and the other is GRU, which is used to extract the temporal features. Finally,  these spatio-temporal features are then feed into the proposed BAT-inspired optimized classifiers to have better gait recognition. e extensive experimentation is carried out using the real-time datasets along with the public datasets such as whuGait and OU-ISIR benchmarks. Results demonstrated the proposed model has shown better recognition rate and less computational cost than the other existing hybrid learning models.
For future work, we would further implement the proposed gait recognition system over the limited hardware resource even on a smartphone. Besides, performance metrics, other parameters such as energy consumption, resource constraint parameters, and computing capability also to be considered for better implementation in real-world scenarios. Furthermore, our gait recognition model can extend its application toward the human behaviors prediction, which can play a vital role in psychology and crime investigation domains.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request (Sangeethak@kdu.edu.et).

Conflicts of Interest
e authors declare that they have no conflicts of interest regarding the present study.

Authors' Contributions
K. M. Monica conceptualized the study, performed data curation, performed formal analysis, developed the methodology, provided the software, and wrote original draft. R. Parvathy did supervision, reviewed and edited the article, did project administration, and performed visualization. A. Gayathri performed visualization, investigation, and formal analysis and provided the software. Rajanikanth Aluvalu performed data Curation, performed investigation, and provided resources software. K. Sangeetha performed supervision, reviewed and edited the article, and performed visualization. Chenna Reddy Vijaya Simha Reddy provided the software, performed validation, wrote the original draft, developed the methodology, and supervision.